TUTORIALS

Welcome to LADAL Tutorials!

Ready to explore data science, statistics, and text analysis?

Our tutorials guide you from beginner basics to advanced research methods, covering everything from fundamental programming concepts to cutting-edge natural language processing techniques. Whether you’re a complete beginner or an experienced researcher looking to expand your toolkit, LADAL has resources to support your journey.


Where Should You Start?

Find Your Path

Complete Beginners:
Start with R Basics to build your foundation, then explore Data Visualization to see your data come to life.

Some R Experience:
Jump straight to Text Analytics or Statistics depending on your research interests.

Exploring Possibilities:
Browse Introduction to Text Analysis and Case Studies to see what’s possible.

Need Structure:
Check out LADAL Courses for organized learning paths.

Quick Decision Guide

Answer these questions to find your starting point:

  1. Do you know R?
    • ❌ No → Start with R Basics
    • ✅ Yes → Continue to question 2
  2. What’s your goal?
  3. What’s your experience level?
    • Beginner → Start at the beginning of your chosen section
    • Intermediate → Skip to specific topics
    • Advanced → Jump to specialized tutorials or case studies

Structured Learning Paths

LADAL Courses

Prefer step-by-step guidance? Our LADAL Courses organize tutorials into clear learning sequences.

Choose from:

Course Type Duration Best For
Short Courses 5-10 tutorials Focused skill development
Long Courses Full semester Comprehensive training

Each course includes:
- Tutorial sequence
- Suggested readings
- Lecture content ideas
- Learning objectives
- Practice exercises

Courses take the guesswork out of what to learn next.


Tutorial Categories

Our tutorials are organized into seven main categories. Each builds on foundational knowledge while allowing you to follow your specific interests.

Category Overview

Category

Focus

Prerequisites

Tutorials

Data Science Basics

Foundational concepts for digital research

None

5 tutorials

R Basics

Programming fundamentals in R

None

7 tutorials

Data Visualization

Creating compelling visual representations

R Basics

3 tutorials

Statistics

Statistical methods and analysis

R Basics, Data Viz (recommended)

9 tutorials

Text Analytics

Computational text analysis techniques

R Basics

14 tutorials

Case Studies

Real-world research applications

R Basics + relevant sections

9 tutorials

How-Tos

Quick practical guides

Varies by tutorial

5 tutorials


Data Science Basics

Section Overview

What you’ll learn: Best practices for digital research, reproducibility, and quantitative reasoning

Prerequisites: None - perfect for complete beginners!

Time investment: 5-8 hours total

Why start here: Build theoretical foundations that support all practical tutorials

These tutorials provide essential theoretical backgrounds for the practical tutorials in other sections. While you can jump straight to hands-on work, understanding these concepts will deepen your comprehension and improve your research practices.

Tutorials in This Section

1. Working with Computers

Tutorial Link

What you’ll learn:
- Organizing files and folders efficiently
- Keeping your computer running smoothly
- Storing data safely and systematically
- Best practices for digital workflows

Why it matters: Good digital hygiene prevents data loss and saves time

Time: ~1 hour


2. Introduction to Data Management

Tutorial Link

What you’ll learn:
- Basic data management techniques
- Folder organization strategies
- File naming conventions
- Data documentation practices

Why it matters: Clean data management is the foundation of reproducible research

Time: ~1 hour


3. Reproducible Research

Tutorial Link

What you’ll learn:
- Principles of reproducibility
- Version control basics
- Documentation strategies
- Creating reproducible workflows

Why it matters: Reproducibility is increasingly required by journals and funders

Time: ~1.5 hours


4. Introduction to Quantitative Reasoning

Tutorial Link

What you’ll learn:
- Logical foundations of the scientific method
- History of quantitative thinking
- Philosophical underpinnings of data analysis
- Critical thinking about numbers

Why it matters: Understanding why helps you apply methods appropriately

Time: ~2 hours


5. Basic Concepts in Quantitative Research

Tutorial Link

What you’ll learn:
- Fundamental concepts in data analysis
- Variables, observations, and measurements
- Descriptive vs. inferential statistics
- Research design basics

Why it matters: These concepts underpin all statistical methods

Time: ~2 hours


Suggested Learning Path
  1. Start with Working with Computers and Data Management for practical skills
  2. Read Reproducible Research before starting actual analyses
  3. Use Quantitative Reasoning and Basic Concepts as references when needed

R Basics

Essential Foundation

This section is required for all other LADAL tutorials.

R is the programming language used throughout LADAL. The skills covered here are assumed knowledge for Statistics, Data Visualization, Text Analytics, and Case Studies sections.

We strongly recommend completing these tutorials in order before moving on.

Why R?

Before diving in, you might wonder: why R? Check out our Why R? page for our reasoning.

Short answer:
- Free and open-source
- Industry standard for data science
- Powerful text analysis capabilities
- Excellent visualization tools
- Huge community and packages
- Reproducible research workflow

Tutorials in This Section

1. Getting Started with R ⭐ START HERE

Tutorial Link

What you’ll learn:
- Installing R and RStudio
- Understanding the RStudio interface
- Basic R syntax and commands
- Working with variables and functions
- Your first R script

Why it matters: This is your foundation for everything else

Time: ~2-3 hours

Prerequisites: None


2. Loading and Saving Data

Tutorial Link

What you’ll learn:
- Reading different file formats (CSV, Excel, TXT)
- Importing data from URLs
- Saving data in various formats
- Working with file paths
- Handling data import issues

Why it matters: You need to get data in and out of R!

Time: ~1.5 hours

Prerequisites: Getting Started with R


3. String Processing

Tutorial Link

What you’ll learn:
- Manipulating text data
- String operations (concatenate, split, replace)
- Working with the stringr package
- Common string processing tasks
- Text cleaning techniques

Why it matters: Essential for text analysis and data cleaning

Time: ~2 hours

Prerequisites: Getting Started with R


4. Regular Expressions

Tutorial Link

What you’ll learn:
- Pattern matching basics
- Regular expression syntax
- Finding and replacing patterns
- Advanced text search techniques
- Practical regex applications

Why it matters: Powerful tool for sophisticated text processing

Time: ~2-3 hours

Prerequisites: String Processing

Difficulty: ⭐⭐ Intermediate


5. Handling Tables in R

Tutorial Link

What you’ll learn:
- Creating and manipulating data frames
- Subsetting and filtering data
- Reshaping data (wide vs. long format)
- Merging and joining tables
- Tabulating data

Why it matters: Most data comes in tabular format

Time: ~2 hours

Prerequisites: Getting Started with R


6. Reproducibility with R

Tutorial Link

What you’ll learn:
- R Markdown basics
- Creating reproducible reports
- Version control with Git
- R Projects for organization
- Documenting your code

Why it matters: Professional research requires reproducibility

Time: ~2-3 hours

Prerequisites: All previous R Basics tutorials


Suggested Learning Path for R Basics

Week 1:
1. Why R? (reading)
2. Getting Started with R
3. Loading and Saving Data

Week 2:
4. String Processing
5. Handling Tables in R

Week 3:
6. Regular Expressions
7. Reproducibility with R

Practice throughout: Complete all exercises in each tutorial before moving on!


Data Visualization

Section Overview

What you’ll learn: Creating professional, publication-quality visualizations

Prerequisites: R Basics (especially Getting Started and Handling Tables)

Time investment: 6-10 hours total

Key skill: Master ggplot2, R’s powerful visualization framework

Effective visualization is crucial for understanding your data and communicating findings. These tutorials teach principles of good design alongside technical skills.

Tutorials in This Section

1. Introduction to Data Visualization ⭐ START HERE

Tutorial Link

What you’ll learn:
- Principles of effective visualization
- Introduction to ggplot2
- Creating basic plots (scatter, bar, line, box)
- Customizing colors, labels, and themes
- Saving publication-quality figures

Why it matters: Foundation for all R visualization

Time: ~3-4 hours

Prerequisites: R Basics


2. Mastering Data Visualization with R

Tutorial Link

What you’ll learn:
- Advanced plot types
- Faceting and small multiples
- Complex data transformations for visualization
- Combining multiple plots
- Creating interactive visualizations

Why it matters: Advanced techniques for complex data

Time: ~3-4 hours

Prerequisites: Introduction to Data Visualization

Difficulty: ⭐⭐ Intermediate to Advanced


3. Showcase: Creating Typological Maps

Tutorial Link

What you’ll learn:
- Interactive map creation with leaflet
- Plotting geographical data
- Adding markers and popups
- Customizing map appearance
- Publishing interactive maps

Why it matters: Essential for spatial/geographical research

Time: ~2 hours

Prerequisites: Introduction to Data Visualization

Special focus: Linguistic typology, dialectology, sociolinguistics


Visualization Best Practices

Before creating any plot, ask:
1. What am I trying to communicate?
2. Who is my audience?
3. What plot type best represents this data?
4. Is my design accessible (colorblind-friendly)?
5. Are my axes clearly labeled?

Remember: A bad visualization is worse than no visualization!


Statistics

Section Overview

What you’ll learn: Statistical methods from descriptive to advanced modeling

Prerequisites: R Basics; Data Visualization recommended

Time investment: 20-30 hours total (depending on selection)

Flexibility: Tutorials don’t need to be completed in order (except where noted)

This section covers statistical methods from foundational concepts to advanced techniques. Start with Descriptive Statistics and Basic Inferential Statistics, then choose tutorials relevant to your research.

Getting Started: Core Foundations

1. Descriptive Statistics ⭐ START HERE

Tutorial Link

What you’ll learn:
- Measures of central tendency (mean, median, mode)
- Measures of dispersion (variance, standard deviation)
- Creating summary statistics
- Exploring data distributions
- Identifying outliers

Why it matters: Always describe your data before analyzing it!

Time: ~2 hours

Prerequisites: R Basics


2. Basic Inferential Statistics ⭐ START HERE

Tutorial Link

What you’ll learn:
- Null hypothesis testing
- t-tests (one-sample, two-sample, paired)
- Chi-square tests
- Correlation analysis
- Interpreting p-values
- Avoiding common pitfalls

Why it matters: Foundation for all inferential methods

Time: ~3 hours

Prerequisites: Descriptive Statistics


Regression and Modeling

3. Regression Analysis

Tutorial Link

What you’ll learn:
- Simple and multiple regression
- Linear models in R
- Model diagnostics
- Interpreting coefficients
- Checking assumptions
- Reporting results

Why it matters: Most widely used method in language sciences

Time: ~4-5 hours

Prerequisites: Basic Inferential Statistics

Difficulty: ⭐⭐ Intermediate


4. Mixed-Effects Models

Tutorial Link

What you’ll learn:
- When to use mixed-effects models
- Random vs. fixed effects
- Model specification in lme4
- Comparing nested models
- Interpreting random effects
- Reporting mixed models

Why it matters: Essential for hierarchical/repeated measures data

Time: ~5-6 hours

Prerequisites: Regression Analysis

Difficulty: ⭐⭐⭐ Advanced


Machine Learning and Classification

5. Tree-Based Models

Tutorial Link

What you’ll learn:
- Decision trees
- Random forests
- Variable importance
- Classification and regression trees
- Ensemble methods
- Model interpretation

Why it matters: Powerful for both prediction and interpretation

Time: ~4 hours

Prerequisites: Basic Inferential Statistics

Difficulty: ⭐⭐ Intermediate


6. Cluster and Correspondence Analysis

Tutorial Link

What you’ll learn:
- Hierarchical clustering
- K-means clustering
- Correspondence analysis
- Determining optimal cluster numbers
- Visualizing clusters
- Interpreting results

Why it matters: Discover patterns in unlabeled data

Time: ~3-4 hours

Prerequisites: Descriptive Statistics


Semantic and Similarity Analysis

7. Introduction to Lexical Similarity

Tutorial Link

What you’ll learn:
- Measuring text similarity
- String distance metrics
- Edit distance
- Comparing documents
- Applications in linguistics

Why it matters: Foundational for text comparison tasks

Time: ~2-3 hours

Prerequisites: R Basics, String Processing


8. Semantic Vector Space Models

Tutorial Link

What you’ll learn:
- Vector space models
- Distributional semantics
- Word similarity measures
- Semantic clustering
- Applications in NLP

Why it matters: Computational approach to meaning

Time: ~4-5 hours

Prerequisites: Basic statistics, some linear algebra helpful

Difficulty: ⭐⭐⭐ Advanced


Advanced Methods

9. Dimension Reduction Methods

Tutorial Link

What you’ll learn:
- Principal Component Analysis (PCA)
- Factor Analysis
- Multidimensional Scaling (MDS)
- When to use each method
- Interpreting components/factors
- Visualization techniques

Why it matters: Simplify complex multivariate data

Time: ~4 hours

Prerequisites: Basic statistics, correlation

Difficulty: ⭐⭐⭐ Advanced


10. Power Analysis

Tutorial Link

What you’ll learn:
- Determining sample size
- Power calculations
- Effect size estimation
- Planning studies
- Post-hoc power analysis

Why it matters: Design adequately powered studies

Time: ~2-3 hours

Prerequisites: Basic Inferential Statistics


Suggested Learning Paths

For Experimental Research:
1. Descriptive Statistics
2. Basic Inferential Statistics
3. Regression Analysis
4. Mixed-Effects Models (if hierarchical data)
5. Power Analysis (for study planning)

For Corpus Linguistics:
1. Descriptive Statistics
2. Basic Inferential Statistics
3. Regression Analysis
4. Cluster Analysis
5. Correspondence Analysis

For Computational Linguistics:
1. Basic Inferential Statistics
2. Tree-Based Models
3. Semantic Vector Space Models
4. Dimension Reduction
5. Cluster Analysis

For Sociolinguistics:
1. Descriptive Statistics
2. Basic Inferential Statistics
3. Regression Analysis
4. Mixed-Effects Models
5. Correspondence Analysis


Text Analytics

Section Overview

What you’ll learn: Computational methods for analyzing text data

Prerequisites: R Basics required; String Processing and Regular Expressions highly recommended

Time investment: 30-40 hours total for all tutorials

Flexibility: Jump to specific topics after introductory tutorials

This section covers methods for computational text analysis, from basic concordancing to advanced NLP techniques. Start with the introductory tutorials, then explore methods relevant to your research.

Getting Started: Foundations

1. Introduction to Text Analysis ⭐ START HERE

Tutorial Link

What you’ll learn:
- What is text analytics?
- Key concepts and terminology
- Text as data
- Overview of methods
- Common applications
- Research design considerations

Why it matters: Conceptual foundation for all text analysis

Time: ~2 hours (reading/concepts)

Prerequisites: None


2. Practical Overview of Text Analytics Methods ⭐ START HERE

Tutorial Link

What you’ll learn:
- Concordancing basics
- Word frequency analysis
- Collocations
- Keywords
- Text classification
- POS tagging
- Named entity recognition
- Dependency parsing

Why it matters: Hands-on introduction to core methods

Time: ~4-5 hours

Prerequisites: R Basics


Core Text Analysis Methods

3. Finding Words in Text: Concordancing

Tutorial Link

What you’ll learn:
- Creating KWIC (keyword-in-context) displays
- Simple and complex search patterns
- Using regular expressions
- Filtering and sorting concordances
- Analyzing context
- Exporting results

Why it matters: Foundation of corpus linguistics

Time: ~3 hours

Prerequisites: R Basics, String Processing


4. Collocation and N-gram Analysis

Tutorial Link

What you’ll learn:
- Identifying collocations
- Measuring collocation strength
- N-gram extraction
- Visualizing semantic links
- Statistical significance testing
- Applications in phraseology

Why it matters: Uncover word associations and phraseological patterns

Time: ~3-4 hours

Prerequisites: Concordancing, Basic Statistics


5. Keyness and Keyword Analysis

Tutorial Link

What you’ll learn:
- Calculating keyness
- Identifying distinctive vocabulary
- Comparing corpora
- Statistical measures of keyness
- Visualizing keywords
- Interpreting results

Why it matters: Find what makes a text distinctive

Time: ~3 hours

Prerequisites: Basic Statistics


Visualization and Networks

6. Network Analysis

Tutorial Link

What you’ll learn:
- Creating network graphs
- Visualizing relationships
- Network metrics
- Community detection
- Applications to text data
- Interactive networks

Why it matters: Powerful visualization for relationships

Time: ~3-4 hours

Prerequisites: Basic R, Data Visualization helpful


Advanced NLP Methods

7. Topic Modeling

Tutorial Link

What you’ll learn:
- Latent Dirichlet Allocation (LDA)
- Determining optimal topic numbers
- Interpreting topics
- Visualizing topic models
- Applications and limitations

Why it matters: Discover hidden themes in large text collections

Time: ~4-5 hours

Prerequisites: Basic Statistics

Difficulty: ⭐⭐ Intermediate


8. Sentiment Analysis

Tutorial Link

What you’ll learn:
- Sentiment lexicons
- Calculating sentiment scores
- Sentiment over time
- Domain-specific sentiment
- Limitations and cautions

Why it matters: Quantify emotional tone in text

Time: ~3 hours

Prerequisites: Basic text processing


9. Tagging and Parsing

Tutorial Link

What you’ll learn:
- Part-of-speech tagging
- Dependency parsing
- Using udpipe
- Extracting grammatical patterns
- Annotating corpora

Why it matters: Essential for grammatical analysis

Time: ~3-4 hours

Prerequisites: Basic linguistics knowledge helpful

Difficulty: ⭐⭐ Intermediate


10. Word Embeddings and Vector Semantics ⭐ NEW!

Tutorial Link

What you’ll learn:
- What are word embeddings?
- Training word2vec models
- Using pre-trained embeddings (GloVe, fastText)
- Finding similar words
- Word analogies (king - man + woman = queen)
- Visualizing embeddings
- Research applications (semantic change, bias detection)

Why it matters: State-of-the-art approach to computational semantics

Time: ~5-6 hours

Prerequisites: Basic Statistics, some linear algebra helpful

Difficulty: ⭐⭐⭐ Advanced


11. Automated Text Summarization

Tutorial Link

What you’ll learn:
- Extractive summarization
- Sentence scoring methods
- TextRank algorithm
- Creating automatic summaries
- Evaluation methods

Why it matters: Condense large texts automatically

Time: ~2-3 hours

Prerequisites: Basic text processing


12. Spell Checking

Tutorial Link

What you’ll learn:
- Implementing spell checkers
- Handling OCR errors
- Custom dictionaries
- Suggesting corrections
- Batch processing

Why it matters: Clean text data, especially from OCR

Time: ~2 hours

Prerequisites: String Processing


Suggested Learning Paths by Research Area

Corpus Linguistics:
1. Introduction to Text Analysis
2. Practical Overview
3. Concordancing
4. Collocations
5. Keywords
6. Tagging and Parsing

Computational Linguistics:
1. Practical Overview
2. Topic Modeling
3. Word Embeddings
4. Sentiment Analysis
5. Tagging and Parsing
6. Network Analysis

Digital Humanities:
1. Introduction to Text Analysis
2. Concordancing
3. Topic Modeling
4. Sentiment Analysis
5. Word Embeddings
6. Text Summarization

Discourse Analysis:
1. Concordancing
2. Collocations
3. Keywords
4. Sentiment Analysis
5. Network Analysis

Historical Linguistics:
1. Concordancing
2. Collocations
3. Keywords
4. Word Embeddings (semantic change)
5. Network Analysis


Case Studies

Section Overview

What you’ll learn: Real-world applications of LADAL methods

Prerequisites: R Basics + relevant sections for each case study

Why explore: See how methods combine to answer actual research questions

Approach: Choose case studies relevant to your interests

These tutorials show complete research workflows from question to conclusion, demonstrating how to combine methods taught in other LADAL tutorials.

Tutorials in This Section

1. Classifying American Speeches

Tutorial Link

Research question: Can we automatically classify political speeches by party?

What you’ll learn:
- Document classification workflow
- Feature extraction from text
- Machine learning for text
- Model evaluation
- Interpreting results

Methods used: Text processing, classification, evaluation

Time: ~4 hours

Prerequisites: R Basics, basic statistics

Created by: Gerold Schneider and Max Lauber for ATAP


2. Corpus Linguistics with R

Tutorial Link

Research questions: Various corpus-based research scenarios

What you’ll learn:
- Complete corpus analysis workflows
- Frequency analysis
- Dispersion and distribution
- Comparative analysis
- Visualization

Methods used: Multiple corpus linguistics techniques

Time: ~5-6 hours

Prerequisites: R Basics, Concordancing, Basic Statistics


3. Analysing Learner Language

Tutorial Link

Research question: How does learner language differ from native speaker language?

What you’ll learn:
- Learner corpus compilation
- Error analysis
- Comparing native and non-native data
- Statistical testing
- Pedagogical applications

Methods used: Corpus analysis, statistics, visualization

Time: ~4-5 hours

Prerequisites: R Basics, Statistics, Text Analytics

Special focus: Second language acquisition, language teaching


4. Lexicography and Creating Dictionaries

Tutorial Link

Research question: How can we create dictionaries computationally?

What you’ll learn:
- Dictionary creation principles
- Finding synonyms computationally
- Semantic similarity
- Entry generation
- Format and structure

Methods used: Semantic analysis, similarity measures

Time: ~3-4 hours

Prerequisites: R Basics, Text Analytics, Embeddings helpful


5. Visualising Survey and Questionnaire Data

Tutorial Link

Research question: How do we analyze and present questionnaire data?

What you’ll learn:
- Survey design considerations
- Likert scale analysis
- Visualizing categorical data
- Statistical testing for surveys
- Reporting best practices

Methods used: Descriptive statistics, visualization, inferential tests

Time: ~4 hours

Prerequisites: R Basics, Data Visualization, Basic Statistics


6. Creating Vowel Charts in R

Tutorial Link

Research question: How do we visualize vowel formants?

What you’ll learn:
- Extracting formants from Praat
- Processing acoustic data in R
- Creating vowel plots
- Customizing phonetic visualizations
- Comparing speakers/varieties

Methods used: Acoustic analysis, specialized visualization

Time: ~3 hours

Prerequisites: R Basics, Data Visualization, Praat

Special focus: Phonetics, sociolinguistics


7. Computational Literary Stylistics

Tutorial Link

Research question: Can we computationally analyze literary style?

What you’ll learn:
- Stylometric analysis
- Authorship attribution
- Measuring style
- Comparing authors
- Visualizing stylistic features

Methods used: Text analytics, statistics, visualization, clustering

Time: ~5-6 hours

Prerequisites: R Basics, Text Analytics, Statistics

Special focus: Digital humanities, literary studies


8. Reinforcement Learning and Text Summarization

Tutorial Link

Research question: Can reinforcement learning improve text summarization?

What you’ll learn:
- Reinforcement learning basics
- Applying RL to NLP
- Text summarization with RL
- Evaluation methods
- Advanced NLP applications

Methods used: Machine learning, NLP, summarization

Time: ~6-7 hours

Prerequisites: R Basics, Text Analytics, basic ML knowledge

Difficulty: ⭐⭐⭐ Advanced


How to Use Case Studies

For Learning:
1. Read the research question first
2. Think about how you might approach it
3. Work through the tutorial
4. Compare to your approach

For Inspiration:
- See how methods combine in practice
- Adapt workflows to your data
- Learn research design from examples

For Teaching:
- Use as complete examples in courses
- Show students real research applications
- Demonstrate problem-solving process


How-Tos

Section Overview

What you’ll learn: Quick, focused solutions to specific tasks

Prerequisites: Varies by tutorial (usually just R Basics)

Time investment: 1-2 hours per tutorial

Best for: Solving immediate practical problems

These tutorials provide quick, practical guides for common data tasks. Use them as references when you need to accomplish specific objectives.

Tutorials in This Section

1. Converting PDFs to Text

Tutorial Link

Task: Extract text from PDF files

What you’ll learn:
- PDF text extraction
- Optical Character Recognition (OCR)
- Handling scanned documents
- Batch processing PDFs
- Saving to text files

When you need this: Working with PDFs, digitizing documents

Time: ~1.5 hours


2. Creating R Notebooks with Markdown

Tutorial Link

Task: Create reproducible analysis documents

What you’ll learn:
- R Markdown basics
- Formatting with Markdown
- Integrating code and text
- Creating professional reports
- Exporting to multiple formats

When you need this: Documenting analyses, creating reports

Time: ~2 hours


3. Creating Free Online eBooks with bookdown

Tutorial Link

Task: Publish online books with R

What you’ll learn:
- Setting up bookdown
- Organizing chapters
- Cross-referencing
- Publishing to GitHub Pages
- Customizing appearance

When you need this: Course books, long-form documentation, publishing

Time: ~2-3 hours

Prerequisites: R Markdown basics


4. Creating Interactive Jupyter Notebooks

Tutorial Link

Task: Build interactive computational notebooks

What you’ll learn:
- Setting up Jupyter
- Creating interactive notebooks
- Launching from GitHub
- Combining code, visualizations, and narrative
- Sharing interactive content

When you need this: Teaching, interactive tutorials, live documentation

Time: ~2 hours

Note: Uses Python alongside R


5. Downloading Texts from Project Gutenberg

Tutorial Link

Task: Access public domain texts programmatically

What you’ll learn:
- Searching Project Gutenberg
- Downloading texts
- Cleaning Gutenberg texts
- Batch downloading
- Building custom corpora

When you need this: Building literary corpora, accessing public domain texts

Time: ~1 hour


Quick Reference Guide

Need to:
- Extract text from PDFs → Converting PDFs to Text
- Document your analysis → R Notebooks
- Create a course book → bookdown
- Build interactive tutorials → Jupyter Notebooks
- Get literary texts → Project Gutenberg


Getting Help and Support

Troubleshooting

If you encounter problems:

  1. Check prerequisites: Make sure you’ve completed required tutorials
  2. Read error messages carefully: They often tell you exactly what’s wrong
  3. Search online: Copy error messages into Google/Stack Overflow
  4. Check package versions: Update packages with update.packages()
  5. Restart R: Sometimes a fresh session solves issues

Common Issues and Solutions

Frequent Problems

“Package not found”
→ Install it: install.packages("package_name")

“Object not found”
→ Did you load the package with library()?
→ Did you run the code that creates that object?

“File not found”
→ Check your working directory: getwd()
→ Use here::here() for robust file paths

Code runs but doesn’t work
→ Check for typos (R is case-sensitive!)
→ Verify your data loaded correctly
→ Print intermediate results to debug

Additional Resources

R Help:
- ?function_name in R console
- RStudio Help pane
- RStudio Community
- Stack Overflow

Learning Resources:
- R for Data Science (free online book)
- RStudio Cheatsheets
- swirl (interactive R tutorials)

LADAL Support:
- Tutorial feedback form (link in each tutorial)
- Contact LADAL team (link in footer)


Tutorial Statistics

Section

Tutorials

Est..Hours

Difficulty

Data Science Basics

5

5-8

Beginner

R Basics

7

10-15

Beginner

Data Visualization

3

6-10

Beginner-Int

Statistics

10

20-30

Beg-Advanced

Text Analytics

14

30-40

Beg-Advanced

Case Studies

8

25-35

Intermediate

How-Tos

5

8-12

Beginner-Int

**TOTAL**

52

**100-150**

All Levels

Tutorial Collection Highlights
  • 52 comprehensive tutorials covering data science, statistics, and text analytics
  • 100-150 hours of learning content
  • All skill levels from complete beginner to advanced researcher
  • Regularly updated with new content and improvements
  • Free and open access for all learners worldwide

What’s New

Recent Additions

NEW (February 2026):
- Word Embeddings and Vector Semantics - Comprehensive tutorial on word2vec, GloVe, and semantic analysis
- Enhanced concordancing tutorial with advanced features
- Improved data visualization tutorial with more examples
- Updated text analytics overview with current methods

Coming Soon:
- Automates Speech Recognition (ASR) with Whisper
- Advanced transformer models (BERT, GPT)
- Time series analysis for linguistics
- Bayesian methods for language data
- More case studies from current research

Want to contribute? Contact us about creating tutorials for LADAL!



Back to top

Back to HOME